A real estate company plans to purchase properties in New York to rent out short-term as part of their business model. The real estate company has already concluded that two bedroom properties are the most profitable; however, the company doesn’t know which zip codes are the best to invest in.
The objective of this Data Challenge is to analyze the Airbnb and Zillow datasets and suggest the most profitable zipcodes in New York where 2 bedroom propeties can be purchased and rented out on a short term.
Zillow dataset (cost data): Provides selling cost for 2 bedroom properties in each zipcode for various cities. The cost information is available from April 1996 to June 2017.
Airbnb dataset (revenue data): Information about property listings in New York including location, number of bedrooms, reviews, price, availability, property description, etc. AirBnB is the medium through which the real estate company plans to lease out their investment property.
Data Loading: Loading the required dataset
Data Quality: The real world datasets contains several inconsistencies that will be dealt with and data needs to be filtered for NY.
Data Munging: This section concentrates on linking and making the data homogeneous in terms of units, in a scalable manner.
Visual Analysis: This sections provides graphs to help us analyze the most profitable zipcodes
Conclusions: This chunk concentrates on providing further business insights into how the client can achieve greater profits by employing the suggested strategies.
Selling cost information of 2 bedroom properties is available only until June 2017. I perform analysis to fill in the missing cost values by inputing them using LOCF and NOCB : last observation carried forward and next observation carried backwards are two standard ways to achieve such imputations. This will help if we get a specific time-stamp of the airbnb data.
I didn’t forecast the cost at a time point further in 2019.
# Store required packages in an array
pkgs <- c(pkgs <- c("data.table","dplyr","ggplot2","tidyr","naniar",
"GGally","Matrix","plotly","ggthemes"))
# If package is not present, install the packages
for (pkg in pkgs) {
if (!(pkg %in% installed.packages()[, "Package"])) {
install.packages(pkg)
}
}
library("data.table")
library("dplyr")
## Warning: package 'dplyr' was built under R version 3.5.2
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 3.5.2
library("tidyr")
## Warning: package 'tidyr' was built under R version 3.5.2
library("ggthemes")
## Warning: package 'ggthemes' was built under R version 3.5.2
library("GGally")
library("Matrix")
library("plotly")
## Warning: package 'plotly' was built under R version 3.5.2
library("naniar")
The Airbnb data set to be loaded is listings.csv and the Zillow dataset is Zip_Zhvi_2bedroom.csv
airbnbDir <- "listings.csv" # read csv file
zillowDir <- "Zip_Zhvi_2bedroom.csv"
Loading files and replacing empty cell values with NA, so it helps with data cleaning later on
airbnb <- fread(airbnbDir,header = TRUE, sep = "," , stringsAsFactors = FALSE, na.strings = c("","NA"))
zillow <- fread(zillowDir,header = TRUE, sep = "," , stringsAsFactors = FALSE, na.strings = c("","NA"))
Last observation carried forward impuated function: This function helps us in filling missing historical cost values in zillow data
na.locf.default <- function(x) {
v <- !is.na(x)
return(c(NA, x[v])[cumsum(v)+1])
}
ds0<- data.frame(MissingCounts= apply(zillow,2, function (x) sum(is.na(x))/nrow(zillow)*100 ))
summary(ds0$MissingCounts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.9474 13.8721 11.3655 17.2815 29.7563
Missings are reasonable for the overall dataset. While the house price columns have missings with some time snapshots but that may not be an issue if missings are random accross time as that can be handled with imputations.
ds1 <- rowSums(zillow[, c(8:262)],na.rm = TRUE)
summary(ds1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2109800 19103100 29690900 36939674 44956950 391390500
No NAs found, and that means we have some value for each city zipcode at some given point in time, and hence this is not a concern
ds2<- apply(zillow[, c(8:262)],1,min,na.rm = TRUE)
summary(ds2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22400 60800 82100 98650 112575 1428000
Result: As expected, prices are non-negative
zillow_filtered <- zillow[which(zillow$State =="NY"),]
nrow(zillow_filtered)
## [1] 475
Changing zillow column name “RegionName” to “Zipcode”, this will help in merging later
setnames(zillow_filtered,"RegionName", "zipcode")
dim(airbnb)
## [1] 48895 106
table(airbnb$state)
##
## CA MP New York NJ ny Ny NY
## 3 2 1 1 2 3 7 48873
airbnb$state <- (gsub("New York","NY",airbnb$state))
airbnb$state <- (gsub("ny","NY",airbnb$state))
airbnb$state <- (gsub("Ny","NY",airbnb$state))
airbnb_filtered <- airbnb[which(airbnb$state=="NY" & airbnb$bedrooms == 2),]
nrow(airbnb_filtered)
## [1] 6496
Changing the column type to character for easier merging and uniform structure
zillow_filtered$zipcode <- as.character(zillow_filtered$zipcode)
airbnb_filtered$zipcode <- as.character(airbnb_filtered$zipcode)
Covers the data manipulations and new char creation to accomplish a scalable and accurate approach to handle larger data spanning multiple revenue snapshots (airbnb).
Please Note: Currently the airbnb data is available for a single snapshot in 2019. On ground problem can have multiple time stamps. So while merging datasets, we need to match cost and revenue by zipcode as well the snapshot for which they are applicable.
Derive insights from the raw and derieved fields
# melt zillow_filtered data to have a row with price for each time point.
# this allows for merge prices and cost accurate at zip and time level.
zillow_filLong <- melt(zillow_filtered,id=c("RegionID","zipcode","City","State","Metro","CountyName","SizeRank"))
setnames(zillow_filLong,"variable", "time")
setnames(zillow_filLong,"value", "cost")
# summary of cost in long data
summary(zillow_filLong$cost)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 32700 74900 109100 175505 194400 3347100 13985
# filter relevant columns from zillow_filLong
zillow_filLong<- zillow_filLong[,c("zipcode", "time", "cost")]
Why do this? This helps to scale analysis to be carried over a longer window of data wherein we are able to capture house prices over different times and aptly match them rentals, merging by a combination of zipcode and monthYear
LOCF and NOCB : last observation carried forward and next observation carried backwards are two standard ways to achieve such imputations.
zillow_filLong <- zillow_filLong[order(zipcode,time)]
zillow_filLong <- zillow_filLong[, cost:= na.locf.default(cost), by=.(zipcode)]
zillow_filLong <- zillow_filLong[order(zipcode,-time)]
zillow_filLong <- zillow_filLong[, cost:= na.locf.default(cost), by=.(zipcode)]
summary(zillow_filLong$cost)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32700 75400 109900 183091 200400 3347100
Great! Now we have cost points for all times and zipcodes.
table(airbnb_filtered$last_scraped,useNA = "ifany")
##
## 2019-07-08 2019-07-09
## 4738 1758
table(airbnb_filtered$calendar_last_scraped,useNA = "ifany")
##
## 2019-07-08 2019-07-09
## 4738 1758
table(airbnb_filtered$calendar_updated,useNA = "ifany")
##
## 1 week ago 10 months ago 11 months ago 12 months ago 13 months ago
## 52 42 51 46 40
## 14 months ago 15 months ago 16 months ago 17 months ago 18 months ago
## 43 39 41 53 37
## 19 months ago 2 days ago 2 months ago 2 weeks ago 20 months ago
## 63 79 376 507 42
## 21 months ago 22 months ago 23 months ago 24 months ago 25 months ago
## 19 20 43 43 45
## 26 months ago 27 months ago 28 months ago 29 months ago 3 days ago
## 27 31 24 25 199
## 3 months ago 3 weeks ago 30 months ago 31 months ago 32 months ago
## 289 342 22 51 24
## 33 months ago 34 months ago 35 months ago 36 months ago 37 months ago
## 21 29 29 39 35
## 38 months ago 39 months ago 4 days ago 4 months ago 4 weeks ago
## 26 19 112 170 252
## 40 months ago 41 months ago 42 months ago 43 months ago 44 months ago
## 15 22 20 47 29
## 45 months ago 46 months ago 47 months ago 48 months ago 49 months ago
## 23 15 18 11 10
## 5 days ago 5 months ago 5 weeks ago 50 months ago 53 months ago
## 161 139 220 3 3
## 54 months ago 55 months ago 56 months ago 57 months ago 58 months ago
## 1 4 5 1 1
## 6 days ago 6 months ago 6 weeks ago 60 months ago 61 months ago
## 70 108 179 3 1
## 64 months ago 66 months ago 67 months ago 68 months ago 7 months ago
## 1 7 1 1 114
## 7 weeks ago 74 months ago 8 months ago 9 months ago a week ago
## 105 1 72 41 421
## never today yesterday
## 15 941 220
Availability update is scattered over time, though the information of scrapping is pretty recent. In the Airbnb data we have the data of the last pull. We assume the prices shown are active at current time.
Further, since the cost/house price data is available till 2017, we assume it to be the same through 2019.
In ideal case we may want to fetch the prices for 2017 from Airbnb to match the timelines of cost and revenue, but that comes at the cost of analysis losing its recency since it would be done on two year old scenario and things might have changed since OR we could use forecast techniques like ARIMA to arrive at prices of 2019
# Recent data most applicable.
zillow_filLong <- zillow_filLong[time=="2017-06"]
# merge two datasets by zipcode... since a single time overlapp is available
merge_data <- merge(airbnb_filtered, zillow_filLong , by = "zipcode" )
relevantColumns<-c("zipcode","street","neighbourhood_group_cleansed","latitude","longitude","square_feet"
,"cost","price","cleaning_fee","minimum_nights"
,"maximum_nights","number_of_reviews","review_scores_rating"
,"time","property_type","room_type"
,"bedrooms")
merge_data <- merge_data[,relevantColumns,with=F]
Formatting price and cleaning_fees columns to remove $ symbol and changing type to numeric
merge_data$price <- (gsub("\\$","",merge_data$price))
merge_data$price <- (gsub("\\,","",merge_data$price))
merge_data$cleaning_fee <- (gsub("\\$","",merge_data$cleaning_fee))
merge_data$cleaning_fee <- (gsub("\\,","",merge_data$cleaning_fee))
merge_data$cleaning_fee <- as.numeric(merge_data$cleaning_fee)
merge_data$price <- as.numeric(merge_data$price)
gg_miss_var(merge_data, show_pct = TRUE) + labs(y = "Percentage")
Impute Missing values into cleaning_fee column: from summary, the mean, median, and mode are almost the same values, hence mean is chosen
summary(merge_data$cleaning_fee)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 75.0 100.0 109.8 150.0 482.0 211
merge_data <- merge_data[, cleaning_fee:= ifelse(is.na(cleaning_fee),mean(cleaning_fee,na.rm=TRUE),cleaning_fee)]
summary(merge_data$cleaning_fee) #All NA's were inputed
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 75.0 100.0 109.8 134.0 482.0
Also, price of the daily rental in Revenue data is reflective of the space that is offered and not the entire property itself. The price must be specifically corrected to account for entire property to account the benefit. Assumption Made: If the property type == Private Room, it is multipled by number of bedrooms to account for overall price. Correction applied is returned to original price column.
merge_data <- merge_data[, price:=ifelse(room_type == "Private room",price * bedrooms, price)]
We are assuming an average stay is for 3 days (infered this from a study from Pillow.com), so effectively a share of cleaning fee adds to the revenue every day. This variable will be used later in calcualting other variables for visualisations.
merge_data[, revenuePerDay:= price+cleaning_fee/3]
While Outlier treatment is very useful in regression analysis, in a case like this its usefulness is limited to within group (zipcode) here.
outliers <- quantile(merge_data$revenuePerDay,probs = c(0.01,0.99))
merge_data[, revenuePerDay:= ifelse(revenuePerDay >=outliers[2],outliers[2],revenuePerDay)]
merge_data[, revenuePerDay:= ifelse(revenuePerDay <=outliers[1],outliers[1],revenuePerDay)]
summary(merge_data$revenuePerDay)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 103.0 200.3 272.7 324.1 376.7 1481.2
outliers <- quantile(merge_data$cost,probs = c(0.01,0.99))
merge_data[, cost:= ifelse(cost >=outliers[2],outliers[2],cost)]
merge_data[, cost:= ifelse(cost <=outliers[1],outliers[1],cost)]
summary(merge_data$cost)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 382300 1302300 1712900 1791108 2147000 3316500
summary(merge_data)
## zipcode street neighbourhood_group_cleansed
## Length:1565 Length:1565 Length:1565
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## latitude longitude square_feet cost
## Min. :40.52 Min. :-74.21 Min. : 0.0 Min. : 382300
## 1st Qu.:40.68 1st Qu.:-74.00 1st Qu.: 650.0 1st Qu.:1302300
## Median :40.73 Median :-73.99 Median :1000.0 Median :1712900
## Mean :40.73 Mean :-73.98 Mean : 902.3 Mean :1791108
## 3rd Qu.:40.76 3rd Qu.:-73.97 3rd Qu.:1125.0 3rd Qu.:2147000
## Max. :40.81 Max. :-73.72 Max. :1600.0 Max. :3316500
## NA's :1538
## price cleaning_fee minimum_nights maximum_nights
## Min. : 50.0 Min. : 0.0 Min. : 1.00 Min. : 1
## 1st Qu.: 175.0 1st Qu.: 75.0 1st Qu.: 2.00 1st Qu.: 30
## Median : 239.0 Median :100.0 Median : 3.00 Median : 1125
## Mean : 294.2 Mean :109.8 Mean : 10.14 Mean : 13463
## 3rd Qu.: 338.0 3rd Qu.:134.0 3rd Qu.: 7.00 3rd Qu.: 1125
## Max. :4000.0 Max. :482.0 Max. :365.00 Max. :20000000
##
## number_of_reviews review_scores_rating time property_type
## Min. : 0.00 Min. : 20.00 2017-06:1565 Length:1565
## 1st Qu.: 1.00 1st Qu.: 92.00 1996-04: 0 Class :character
## Median : 4.00 Median : 96.00 1996-05: 0 Mode :character
## Mean : 19.78 Mean : 94.15 1996-06: 0
## 3rd Qu.: 17.00 3rd Qu.:100.00 1996-07: 0
## Max. :403.00 Max. :100.00 1996-08: 0
## NA's :387 (Other): 0
## room_type bedrooms revenuePerDay
## Length:1565 Min. :2 Min. : 103.0
## Class :character 1st Qu.:2 1st Qu.: 200.3
## Mode :character Median :2 Median : 272.7
## Mean :2 Mean : 324.1
## 3rd Qu.:2 3rd Qu.: 376.7
## Max. :2 Max. :1481.2
##
glimpse(merge_data)
## Observations: 1,565
## Variables: 18
## $ zipcode <chr> "10003", "10003", "10003", "10003",…
## $ street <chr> "New York, NY, United States", "New…
## $ neighbourhood_group_cleansed <chr> "Manhattan", "Manhattan", "Manhatta…
## $ latitude <dbl> 40.72577, 40.72939, 40.73476, 40.73…
## $ longitude <dbl> -73.98745, -73.98857, -73.98452, -7…
## $ square_feet <int> NA, 1100, NA, 1000, NA, NA, 800, NA…
## $ cost <dbl> 2147000, 2147000, 2147000, 2147000,…
## $ price <dbl> 249, 189, 400, 245, 549, 151, 240, …
## $ cleaning_fee <dbl> 85.0000, 115.0000, 120.0000, 200.00…
## $ minimum_nights <int> 5, 2, 2, 5, 30, 2, 5, 2, 5, 30, 2, …
## $ maximum_nights <int> 365, 28, 8, 1125, 365, 28, 365, 112…
## $ number_of_reviews <int> 166, 403, 105, 18, 58, 214, 86, 66,…
## $ review_scores_rating <int> 85, 86, 94, 96, 94, 84, 96, 97, 100…
## $ time <fct> 2017-06, 2017-06, 2017-06, 2017-06,…
## $ property_type <chr> "Apartment", "Apartment", "Apartmen…
## $ room_type <chr> "Entire home/apt", "Entire home/apt…
## $ bedrooms <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ revenuePerDay <dbl> 277.3333, 227.3333, 440.0000, 311.6…
In this approach, we are calculating Revenue/Cost Ratio for the first year, we are using our assumption that an average stay is for 3 days, so effectively a share of cleaning fee adds to the revenue every day. We are also considering a 75% occupancy rate throughout the year.
We calulate this ratio at record level and then aggregate it at zipcode level, this helps us in identifying zipcodes which has better returns.
merge_data[,revenueToCostRatio:= (revenuePerDay*365*.75)/cost * 100]
merge_data_summary<- merge_data[, .(averageRevenueToCostRatio=mean(revenueToCostRatio), AvgCost= mean(cost), count= length(cost)), by=.(zipcode)]
rentalReturn<- ggplot(merge_data_summary, aes(x = AvgCost, y = averageRevenueToCostRatio)) +geom_point(aes(colour=zipcode, size= count)) + labs(y = "Average Revenue To Cost Ratio", x = "Average Cost") + scale_x_continuous(labels = scales::comma) + theme_clean()
ggplotly(rentalReturn)
merge_data_summary <- merge_data_summary[order(-averageRevenueToCostRatio)]
topRentalReturn <- head(unique(merge_data_summary$zipcode),15)
print(topRentalReturn)
## [1] "10308" "11003" "11434" "10305" "10306" "11234" "10303" "10314"
## [9] "10304" "10309" "10025" "10036" "11231" "11215" "11201"
The first approach considered the revenue (rent) generated from the property with the regards to the cost in a year. In this approach, we consider that property is going to be held indefinitely we will calculate the number of years it will take for the property to breakeven i.e. the number of years it will take to recover the initial investment.
Setting our variables (We assumed occupancy rate of 75% throughout the year for properties here)
merge_data[,totalAnnualIncome:= revenuePerDay*365*.75]
merge_data[,yearsToProfit:= cost/totalAnnualIncome]
merge_data_year_return<- merge_data[, .(averageYearsToProfit=mean(yearsToProfit), AvgCost= mean(cost)), by=.(zipcode)]
yearReturn<- ggplot(merge_data_year_return, aes(x = reorder(zipcode, -averageYearsToProfit), y = averageYearsToProfit)) + geom_bar(stat="identity", width = .3) + labs(y = "Average Years to Profit", x = "Zip Code") + coord_flip() + theme_clean()
ggplotly(yearReturn,tooltip = c("zipcode","averageYearsToProfit"))
As we can the properties on the top are the quickest to reach their breakeven point
merge_data_year_return <- merge_data_year_return[order(averageYearsToProfit)]
topYearReturn <- head(unique(merge_data_year_return$zipcode),15)
print(topYearReturn)
## [1] "10308" "11003" "11434" "10306" "10305" "11234" "10303" "10314"
## [9] "10304" "10309" "11215" "10036" "11201" "10025" "11231"
Property Appreciation or Real estate appreciation is a simple concept. It refers to how the value of an investment property increases with time.
As we know, property itself can appreciate/depreciate in value, this helps in understanding which properties would be a better investment.
Creating a dataset with avg prices across years across pincodes (long format)
zillow_filLong2 <- melt(zillow_filtered,id=c("RegionID","zipcode","City","State","Metro","CountyName","SizeRank"))
zillow_filLong2[, year:=substr(variable,1,4) ]
annual_cost <- zillow_filLong2[,.(avg_value=mean(value,na.rm=TRUE)) , by=.(year, zipcode)]
h<-as.data.table(table(is.nan(annual_cost$avg_value),annual_cost$year))
h<- h[order(-N)]
print(head(h,20))
## V1 V2 N
## 1: FALSE 2011 475
## 2: FALSE 2012 475
## 3: FALSE 2013 475
## 4: FALSE 2014 475
## 5: FALSE 2015 475
## 6: FALSE 2016 475
## 7: FALSE 2017 475
## 8: FALSE 2010 474
## 9: FALSE 2009 425
## 10: FALSE 2007 423
## 11: FALSE 2008 423
## 12: FALSE 2006 421
## 13: FALSE 2005 419
## 14: FALSE 2004 413
## 15: FALSE 2003 399
## 16: FALSE 2002 391
## 17: FALSE 2001 389
## 18: FALSE 2000 387
## 19: FALSE 1999 376
## 20: FALSE 1998 375
# no missings since 2011
annual_cost <- annual_cost[year>=2011,]
annual_cost <- merge(annual_cost, unique(airbnb_filtered[,c("zipcode","neighbourhood_group_cleansed"),with=F]))
# plotting the results
costPlot <- ggplot(annual_cost,aes(x = year,
y = avg_value,
group = zipcode,
colour = neighbourhood_group_cleansed
)) + labs(y = "Average Value", x = "Year") + geom_line() + geom_point() + scale_y_continuous(labels = scales::comma) + theme_clean()
ggplotly(costPlot)
There are steep lines across different price ranges so a more meaningful analysis is to generate return on prices overtime
Rate of Return on property from Price Growth over 2011-17
# create a dataset with avg house prices across years across pincodes (wide format)
annual_cost_wide <- dcast.data.table(zipcode+neighbourhood_group_cleansed ~ year, data= annual_cost, value.var = "avg_value")
annual_cost_wide[, growthRate:= (`2017`/`2011`-1)*100]
annual_cost_wide$zipcode <- as.character(annual_cost_wide$zipcode)
growthPlot<- ggplot(annual_cost_wide, aes(x = `2017`, y = growthRate, colour=zipcode)) + geom_point()+ scale_y_continuous(labels = scales::comma) +scale_x_continuous(labels = scales::comma)+ labs(y = "Growth Rate", x = "2017 Prices") + theme_clean()
ggplotly(growthPlot)
Shows that mid range priced propery zones offer highest return in terms of property valuation.
annual_cost_wide<- annual_cost_wide[order(-growthRate)]
#checkif these are available in revenue data
topGrowing <- unique(annual_cost_wide$zipcode[annual_cost_wide$zipcode %in% merge_data$zipcode])
topGrowing <- head(topGrowing,15)
print(topGrowing)
## [1] "11217" "11231" "11201" "11215" "10128" "10011" "10025" "10014"
## [9] "10028" "11434" "10003" "10023" "10013" "10021" "10036"
A combination of revenue and property price growth potential has to be considered when arriving at a decision.
topRev <- intersect(topYearReturn,topRentalReturn)
print(topRev)
## [1] "10308" "11003" "11434" "10306" "10305" "11234" "10303" "10314"
## [9] "10304" "10309" "11215" "10036" "11201" "10025" "11231"
topRevGrowing <- intersect(topRev,topGrowing)
print(topRevGrowing)
## [1] "11434" "11215" "10036" "11201" "10025" "11231"
Lower priced property that comes high in revenue criteria is preferred since it lowers the potential cost of funding and investment at risk
merge_data_summary_sec <- merge_data_summary[zipcode %in% topRev]
cheapPricePlot<- ggplot(merge_data_summary_sec, aes(x = averageRevenueToCostRatio, y = AvgCost, colour=zipcode)) + geom_point()+ scale_y_continuous(labels = scales::comma) +scale_x_continuous(labels = scales::comma) + labs(y = "Avg Price", x = "Average Revenue to Cost Ratio") + theme_clean()
ggplotly(cheapPricePlot)
merge_data_summary_sec<- merge_data_summary_sec[order(AvgCost)]
topCheapBuys <- head(unique(merge_data_summary_sec$zipcode),15)
print(topCheapBuys)
## [1] "11003" "11434" "10306" "10303" "10314" "10304" "10309" "10308"
## [9] "10305" "11234" "11215" "11231" "11201" "10025" "10036"
zipCodeSFinal <- intersect(topRevGrowing,topCheapBuys)
print(zipCodeSFinal)
## [1] "11434" "11215" "10036" "11201" "10025" "11231"
Zipcode 10025 in Manhattan is not far behind, it is also present in all 3 measures and is cheaper than 10036 with a better growthRate, this one is the best property to buy in Manhattan
Zipcode 11231 in Brooklyn is a lot similar to zipcode 11215 in that the avg. cost of a 2 bedroom property and the avg. revenue per year obtained from a property are pretty close.
Zipcode 11434 in Queens breakevens the quickest of the lot reccomened here, since the properties are cheaper, the only downside is below average growthRate